Maximum entropy direct model as a unified model for acoustic modeling in speech recognition
نویسندگان
چکیده
Traditional statistical models for speech recognition have been dominated by generative models such as Hidden Markov Models (HMMs). We recently proposed a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, features can be non-independent, asynchronous, and overlapping. In this paper, we discuss how to make the computationally intensive training of such models feasible through parallelizing the IIS (Improved Iterative Scaling) algorithm. The direct model significantly outperforms traditional HMMs in word error rate when used as stand-alone acoustic models. Modest improvements over the best HMM system are seen when combined with HMM and language model scores. The maximum entropy model can potentially incorporate nonindependent features such as acoustic phonetic features in a way that is robust to missing features due to mismatch between training and testing.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملDiscriminative maximum entropy language model for speech recognition
This paper presents a new discriminative language model based on the whole-sentence maximum entropy (ME) framework. In the proposed discriminative ME (DME) model, we exploit an integrated linguistic and acoustic model, which properly incorporates the features from n-gram model and acoustic log likelihoods of target and competing models. Through the constrained optimization of integrated model, ...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملA comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition
While Maximum Entropy (ME) based learning procedures have been successfully applied to text based natural language processing, there are only little investigations on using ME for acoustic modeling in automatic speech recognition. In this paper we show that the well known Generalized Iterative Scaling (GIS) algorithm can be used as an alternative method to discriminatively train the parameters ...
متن کامل